Recognition of facial expression is a challenge when it comes to computer vision. The primary reasons are class imbalance due to data collection and uncertainty due to inherent noise such as fuzzy facial expressions and inconsistent labels. However, current research has focused either on the problem of class imbalance or on the problem of uncertainty, ignoring the intersection of how to address these two problems. Therefore, in this paper, we propose a framework based on Resnet and Attention to solve the above problems. We design weight for each class. Through the penalty mechanism, our model will pay more attention to the learning of small samples during training, and the resulting decrease in model accuracy can be improved by a Convolutional Block Attention Module (CBAM). Meanwhile, our backbone network will also learn an uncertain feature for each sample. By mixing uncertain features between samples, the model can better learn those features that can be used for classification, thus suppressing uncertainty. Experiments show that our method surpasses most basic methods in terms of accuracy on facial expression data sets (e.g., AffectNet, RAF-DB), and it also solves the problem of class imbalance well.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Semantic communication (SemCom) and edge computing are two disruptive solutions to address emerging requirements of huge data communication, bandwidth efficiency and low latency data processing in Metaverse. However, edge computing resources are often provided by computing service providers and thus it is essential to design appealingly incentive mechanisms for the provision of limited resources. Deep learning (DL)- based auction has recently proposed as an incentive mechanism that maximizes the revenue while holding important economic properties, i.e., individual rationality and incentive compatibility. Therefore, in this work, we introduce the design of the DLbased auction for the computing resource allocation in SemComenabled Metaverse. First, we briefly introduce the fundamentals and challenges of Metaverse. Second, we present the preliminaries of SemCom and edge computing. Third, we review various incentive mechanisms for edge computing resource trading. Fourth, we present the design of the DL-based auction for edge resource allocation in SemCom-enabled Metaverse. Simulation results demonstrate that the DL-based auction improves the revenue while nearly satisfying the individual rationality and incentive compatibility constraints.
translated by 谷歌翻译
Edge-assisted vehicle-to-everything (V2X) motion planning is an emerging paradigm to achieve safe and efficient autonomous driving, since it leverages the global position information shared among multiple vehicles. However, due to the imperfect channel state information (CSI), the position information of vehicles may become outdated and inaccurate. Conventional methods ignoring the communication delays could severely jeopardize driving safety. To fill this gap, this paper proposes a robust V2X motion planning policy that adapts between competitive driving under a low communication delay and conservative driving under a high communication delay, and guarantees small communication delays at key waypoints via power control. This is achieved by integrating the vehicle mobility and communication delay models and solving a joint design of motion planning and power control problem via the block coordinate descent framework. Simulation results show that the proposed driving policy achieves the smallest collision ratio compared with other benchmark policies.
translated by 谷歌翻译
Hyperspectral Imaging (HSI) provides detailed spectral information and has been utilised in many real-world applications. This work introduces an HSI dataset of building facades in a light industry environment with the aim of classifying different building materials in a scene. The dataset is called the Light Industrial Building HSI (LIB-HSI) dataset. This dataset consists of nine categories and 44 classes. In this study, we investigated deep learning based semantic segmentation algorithms on RGB and hyperspectral images to classify various building materials, such as timber, brick and concrete.
translated by 谷歌翻译
Cross-lingual transfer learning without labeled target language data or parallel text has been surprisingly effective in zero-shot cross-lingual classification, question answering, unsupervised machine translation, etc. However, some recent publications have claimed that domain mismatch prevents cross-lingual transfer, and their results show that unsupervised bilingual lexicon induction (UBLI) and unsupervised neural machine translation (UNMT) do not work well when the underlying monolingual corpora come from different domains (e.g., French text from Wikipedia but English text from UN proceedings). In this work, we show that a simple initialization regimen can overcome much of the effect of domain mismatch in cross-lingual transfer. We pre-train word and contextual embeddings on the concatenated domain-mismatched corpora, and use these as initializations for three tasks: MUSE UBLI, UN Parallel UNMT, and the SemEval 2017 cross-lingual word similarity task. In all cases, our results challenge the conclusions of prior work by showing that proper initialization can recover a large portion of the losses incurred by domain mismatch.
translated by 谷歌翻译
可重新配置的智能表面(RIS)可以显着增强TERA-HERTZ大量多输入多输出(MIMO)通信系统的服务覆盖范围。但是,获得有限的飞行员和反馈信号开销的准确高维通道状态信息(CSI)具有挑战性,从而严重降低了常规空间分裂多次访问的性能。为了提高针对CSI缺陷的鲁棒性,本文提出了针对RIS辅助TERA-HERTZ多用户MIMO系统的基于深度学习的(DL)基于速率的多访问(RSMA)方案。具体而言,我们首先提出了基于DL的混合数据模型驱动的RSMA预编码方案,包括RIS的被动预编码以及模拟主动编码和基本站(BS)的RSMA数字活动预码。为了实现RIS的被动预码,我们提出了一个基于变压器的数据驱动的RIS反射网络(RRN)。至于BS的模拟主动编码,我们提出了一个基于匹配器的模拟预编码方案,因为BS和RIS采用了Los-Mimo天线阵列结构。至于BS的RSMA数字活动预码,我们提出了一个低复杂性近似加权的最小均方误差(AWMMSE)数字编码方案。此外,为了更好地编码性能以及较低的计算复杂性,模型驱动的深层展开的主动编码网络(DFAPN)也是通过将所提出的AWMMSE方案与DL相结合的。然后,为了在BS处获得准确的CSI,以实现提高光谱效率的RSMA预编码方案,我们提出了一个CSI采集网络(CAN),具有低飞行员和反馈信号开销,下行链接飞行员的传输,CSI在此处使用CSI的CSI反馈。 (UES)和BS处的CSI重建被建模为基于变压器的端到端神经网络。
translated by 谷歌翻译
拟议的控制方法使用基于自适应的馈电控制器来为CDPR建立一个被动输入输出映射,该映射与线性不变的严格阳性真实反馈控制器一起使用,以确保稳健的闭环输入输出稳定性和渐进式姿势轨迹通过消极定理跟踪。所提出的控制器的新颖性是其配方用于一系列有效载荷态度参数化,包括任何无约束的态度参数化,四元组或方向余弦矩阵(DCM)。通过用刚性和柔性电缆的CDPR进行数值模拟,证明了所提出的控制器的性能和鲁棒性。结果证明了仔细定义CDPR的姿势误差的重要性,CDPR的姿势误差是在使用Quaternion和dcm时以乘法方式执行的,并且在使用不受约束的态度参数时(例如Euler-andle-angle序列)时以特定的添加剂方式执行。
translated by 谷歌翻译
几种慢性肺疾病,例如特发性肺纤维化(IPF)的特征是气道异常扩张。计算机断层扫描(CT)上气道特征的定量可以帮助表征疾病进展。已经开发了基于物理的气道测量算法,但由于在临床实践中看到的气道形态多样性,因此取得了有限的成功。由于获得精确的气道注释的高成本,监督学习方法也不可行。我们建议使用感知损失通过样式转移进行综合气道,以训练我们的模型气道转移网络(ATN)。我们使用a)定性评估将ATN模型与最先进的GAN网络(SIMGAN)进行比较; b)评估基于ATN和SIMGAN的CT气道指标预测113例IPF患者死亡率的能力。与Simgan相比,ATN被证明更快,更容易训练。还发现基于ATN的气道测量值始终比IPF CTS上的SIMGAN衍生气道指标更强大。通过转化网络使用感知损失来完善合成数据的转化网络是基于GAN的方法的现实替代方法,用于用于特发性肺纤维化的临床CT分析。我们的源代码可以在https://github.com/ashkanpakzad/atn上找到,该源代码与Airquant的现有开放源气道分析框架兼容。
translated by 谷歌翻译
Stack Overflow是最受欢迎的编程社区之一,开发人员可以为他们遇到的问题寻求帮助。然而,如果没有经验的开发人员无法清楚地描述他们的问题,那么他们很难吸引足够的关注并获得预期的答案。我们提出了M $ _3 $ NSCT5,这是一种自动从给定代码片段生成多个帖子标题的新颖方法。开发人员可以使用生成的标题查找密切相关的帖子并完成其问题描述。 M $ _3 $ NSCT5使用Codet5骨干,这是一种具有出色语言理解和发电能力的预训练的变压器模型。为了减轻歧义问题,即在不同背景下可以将相同的代码片段与不同的标题保持一致,我们提出了最大的边缘多元核抽样策略,以一次产生多个高质量和不同的标题候选者,以便开发人员选择。我们构建了一个大规模数据集,其中包含890,000个问题帖子,其中涵盖了八种编程语言,以验证M $ _3 $ NSCT5的有效性。 BLEU和胭脂指标的自动评估结果表明,M $ _3 $ NSCT5的优势比六个最先进的基线模型。此外,具有值得信赖结果的人类评估也证明了我们对现实世界应用方法的巨大潜力。
translated by 谷歌翻译